Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster presentations at ISMB 2020 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2020. There are Q&A opportunities through a chat function to allow interaction between presenters and participants.

Preliminary information on preparing your poster and poster talk are available at: https://www.iscb.org/ismb2020-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time
Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time
July 14 between 10:40 am - 2:00 pm EDT
A bootstrap approach to judge robustness in viral meta genomics
COSI: MICROBIOME COSI
  • Babak Saremi, Tierärztlichte Universität Hannover, Germany
  • Moritz Kohls, Tierärztlichte Universität Hannover, Germany
  • Klaus Jung, Tierärztlichte Universität Hannover, Germany

Short Abstract: A frequent application of next-generation sequencing is to identify viral genomic sequences in the biological sample of an infected host (Kruppa et al., Infect Genet Evol 2019, 66, 180-187). Sequence data can be affected by different technical errors, e.g. due to probe preparation or false base calling, leading to wrongly identified viruses.

Repeatedly sequencing a sample from the infected host for the purpose of judging the robustness is too costly. Therefore, we present a bootstrap approach for re-sampling sequencing data to approximate the robustness of NGS experiments in viral meta genomics. The bootstrap algorithm repeatedly draws sequence reads from an input fast-Q file to obtain B bootstrap fast-Q files. Each fast-Q file is mapped against a reference database of viral sequences and the B results are summarized by average read counts per virus as well as confidence intervals.

The summary measures provide increased or decreased evidence for an identified virus. To evaluate our new approach, the bootstrap procedure is run on a simulated set of paired-end fast-Q files, using the ART-tool (Huang et al., Bioinf 2011, 28, 593-594), with known viral sequence content.

A computationally intensive approach to discover novel adaptation genes in extreme environments
COSI: MICROBIOME COSI
  • Tatyana Zamkovaya, University of Florida, United States
  • Ana Conesa, University of Florida, United States

Short Abstract: Successful plant and animal colonization of extra-terrestrial environments requires complete knowledge of the origin and evolution of adaptation mechanisms of life on Earth. The key to understand how life evolves and adapts to extreme conditions lies in understanding uncharacterized microbes and their gene products. Unfortunately, microbial gene function prediction remains challenging. Existing gene function prediction tools require well-annotated reference genomes, which are currently absent for most uncultivated species. We developed a novel strategy to infer gene function, centered on using 16S rRNA sequences of unknown species as probes, via BLAST search, to find novel genes within functionally annotated scaffolds of a manually-curated comprehensive metagenome database. Screening a multi-TB metagenome database against FASTA sequences of the top 5 unknown and known hub taxa from a previous network analysis resulted in successful retrieval of high-confidence gene-rich scaffold matches, containing numerous novel adaptation-related operons and large proportions of hypothetical genes. Via comparative genomics and bioinformatics, we inferred functions for hypothetical genes from a putative oxidative-stress-related operon. Our approach enables identification and prediction of novel genes while accommodating the computational power required of large-scale datasets. Future comparative genomics analysis on a wider gene subset would enrich understanding of key survival strategies for harsh environments.

ANTIFAM 6.0 : ARE WE CLOSE TO GETTING RID OF SPURIOUS PROTEINS IN SEQUENCE DATABASES?
COSI: MICROBIOME COSI
  • Syed Muktadir Al Sium, Bangladesh Agricultural University, Bangladesh
  • Alex Bateman, European Bioinformatics Institute EMBL-EBI, United Kingdom

Short Abstract: Most protein sequences come from gene prediction tools and have no experimental supporting evidence for their translation. Many spurious protein predictions exist in the sequence databases. The AntiFam database and Spurio software help to identify spurious proteins. The machine learning based tool Spurio’s performance has a dependency on AntiFam for the training dataset. AntiFam, a collection of profile-HMMs used to identify spurious protein families, only contained 72 spurious families and each family required manual curation to be built and verified as spurious. We aimed to increase the coverage of spurious protein identification by increasing the number of entries in AntiFam. We struggled to decide a protein’s spurious status and position in Shadow ORFs using Spurio version 1.1. So, we developed a new approach using BLASTX or USEARCH for finding spurious proteins from shadow (-1, -2, -3 frames) and alternate (+2, +3 frames) ORFs. 178 new AntiFam families have been added to the new release (AntiFam 6.0) providing 28,704 new spurious prokaryotic sequences for Spurio. Additionally, protein disorder analysis with IUPred2A showed significant differences between SwissProt and AntiFam sequences. Jointly, these findings help us to get rid of spurious proteins in sequence databases.

Are metagenomics data sufficiently informative for potential non-invasive diagnosis of inflammatory bowel disease status — Outcomes of the crowdsourced sbv IMPROVER MEDIC challenge
COSI: MICROBIOME COSI
  • Carine Poussin, Philip Morris International RandD, Switzerland
  • Lusine Khachatryan, Philip Morris International RandD, Switzerland
  • Yang Xiang, Philip Morris International RandD, Switzerland
  • Adrian Stan, Philip Morris International RandD, Switzerland
  • James Battey, Philip Morris International RandD, Switzerland
  • Giuseppe Lo Sasso, Philip Morris International RandD, Switzerland
  • Stéphanie Boué, Philip Morris International RandD, Switzerland
  • Nicolas Sierro, Philip Morris International RandD, Switzerland
  • Nikolai V. Ivanov, Philip Morris International RandD, Switzerland
  • Manuel C. Peitsch, Philip Morris International RandD, Switzerland
  • Julia Hoeng, Philip Morris International RandD, Switzerland

Short Abstract: A growing body of evidence links gut microbiota changes with inflammatory bowel disease (IBD), raising the question of the potential benefit of exploiting metagenomics data for non-invasive IBD diagnostics. Open between September 2019 and March 2020, the sbvIMPROVER Metagenomics Diagnosis for Inflammatory Bowel Disease Challenge (MEDIC) investigated computational metagenomics methods for discriminating IBD and non-IBD subjects. For developing and applying models for classifying metagenomics fecal samples, participants were offered the option to start with raw (sub-challenge 1, SC1) or taxonomy- and pathway-based processed (sub-challenge 2, SC2) independent training and test metagenomics data from IBD and non-IBD subjects. We have received and scored a total of 81 anonymized submissions. The results show that many participants’ predictions performed better than random predictions for classifying IBD vs. non-IBD, Ulcerative Colitis (UC) vs. non-IBD, and Crohn’s Disease (CD) vs. non-IBD. However, discrimination of UC and CD remains challenging, with very few submissions reaching the level of significance. Following the challenge, we are conducting an analysis of class predictions and metagenomics features across the teams, including evaluation of the computational methods used to solve the problem. These results will be openly shared with the scientific community to help advance research in the field of IBD.

Assembling reads improves taxonomic classification of species
COSI: MICROBIOME COSI
  • Quang Tran, University of Memphis, United States
  • Vinhthuy Phan, University of Memphis, United States

Short Abstract: Background: Current metagenomic classifiers and profilers employ short reads to classify. Many methods adopt techniques that aim to identify unique genomic regions of genomes so as to differentiate them. Because of this, short-read lengths might be suboptimal. Longer read lengths might improve the performance of classification and profiling. However, longer reads produced by current technology tend to have a higher rate of sequencing errors, compared to short reads. It is not clear if the trade-off between longer length versus higher sequencing errors will increase or decrease classification and profiling performance.

Results: We compared performance of popular metagenomic classifiers on short reads and longer reads, which are assembled from the same short reads. When using a number of popular assemblers to assemble long reads from the short reads, we discovered that most classifiers made fewer predictions with longer reads and that they achieved higher classification performance on synthetic metagenomic data. Specifically, across most classifiers, we observed a significant increase in precision, while recall remained the same, resulting in higher overall classification performance. On real metagenomic data, we observed a similar trend that classifiers made fewer predictions. This suggested that they might have the same performance characteristics with longer reads.

Assembly graph-based variant discovery reveals novel dynamics in the human microbiome
COSI: MICROBIOME COSI
  • Jay Ghurye, Dovetail Genomics, United States
  • Todd Treangen, Rice University, United States
  • Sergey Koren, NIH, United States
  • Marcus Fedarko, University of California, San Diego, United States
  • Harihara Subrahmaniam Muralidharan, University of Maryland, College Park, United States
  • Jacquelyn S Meisel, University of Maryland, College Park, United States
  • Mihai Pop, University of Maryland, College Park, United States

Short Abstract: Sequence variation within metagenomes reveals important information about the structure, function, and evolution of microbial communities. However, most existing methods for variant detection are reference-dependent and are limited to identifying single nucleotide polymorphisms (SNPs), missing more complex structural changes. We developed MetaCarvel (​github.com/marbl/MetaCarvel​), a reference-independent tool that incorporates paired-end read information to link together contigs into confident scaffolds and detects a rich set of graph signatures indicative of biologically-relevant variants. We applied MetaCarvel to almost 1,000 metagenomes from the Human Microbiome Project and identified over nine million variants representing insertion/deletion events, complex strain differences, plasmids, and repeats. The majority of identified variants were repeats, some corresponding to mobile genetic elements. Our analysis revealed striking differences in the rate of variation across body sites, highlighting niche-specific mechanisms of bacterial adaptation. We identified more indels and strain variants in the oral cavity than in the comparatively nutrient-rich gut. In particular, we highlight a ​Streptococcus​ variant from neighboring sites in the oral cavity suggesting that, despite their close proximity, bacteria within each microenvironment utilize unique approaches for effective colonization. This work highlights the utility of using graph-based variant detection to capture biologically significant signals in microbial populations.

Bait-capture metagenomics for detection of antimicrobial resistance genes and plasmid markers in animals and a mock community
COSI: MICROBIOME COSI
  • Julie Shay, Canadian Food Inspection Agency, Canada
  • Ashley Cooper, Canadian Food Inspection Agency, Canada
  • Attiq Muhammad, Agriculture and Agri-Food Canada, Canada
  • Dominic Poulin-Laprade, Agriculture and Agri-Food Canada, Canada
  • Rahat Zaheer, Agriculture and Agri-Food Canada, Canada
  • Burton Blais, Canadian Food Inspection Agency, Canada
  • Moussa Diarra, Agriculture and Agri-Food Canada, Canada
  • Tim McAllister, Agriculture and Agri-Food Canada, Canada
  • Guylaine Talbot, Agriculture and Agri-Food Canada, Canada
  • Catherine Carrillo, Canadian Food Inspection Agency, Canada
  • Calvin Lau, Canadian Food Inspection Agency, Canada

Short Abstract: Bait-capture is a technique where DNA fragments of interest are enriched before sequencing by hybridizing with biotinylated probes. Bait-capture metagenomics allows for gene detection at lower abundance, and with lower sequencing depth, while still allowing detection of novel gene sequences. We designed baits for 4275 antimicrobial resistance (AMR) genes from the NCBI AMRFinderPlus database and 266 plasmid markers from the PlasmidFinder database. We developed a Galaxy pipeline to detect targets in metagenomic data sets. We tested limit of detection of shotgun data for multiple pipelines using in silico mock communities. We performed shotgun and bait-capture metagenomics on 36 swine, beef, and poultry samples, and a 35-component in vitro mock community. 98% of expected AMR genes in the mock community were detected from just 1.25 million HiSeq reads with a 90% gene coverage cutoff, while only 60% were detected by shotgun sequencing of the same community with 10 million HiSeq reads. The bait-capture approach can detect greater AMR gene diversity compared to shotgun sequencing, although bait-capture may be less sensitive for detecting genes not in the target data set. A higher gene coverage cutoff can be used with bait-capture sequencing, which allows for distinguishing between alleles within an AMR gene family.

Benchmarking metagenomic classification tools for long read sequencing data
COSI: MICROBIOME COSI
  • Josip Maric, Faculty of Electrical Engineering and Computing, Croatia
  • Krešimir Križanović, Faculty of Electrical Engineering and Computing, Croatia
  • Mile Šikić, Genome institute of Singapore, Singapore
  • Sylvain Riondet, National University of Singapore / Genome Institute of Singapore, Singapore
  • Niranjan Nagarajan, Genome Institute of Singapore, Singapore

Short Abstract: In recent years, the fields of both long-read sequencing and metagenomic analysis have been significantly advanced. Although long-read sequencing technologies have been primarily used for de novo genome assembly, they are rapidly maturing for widespread use in other applications. In particular, long reads could potentially lead to more precise taxonomic identification which has sparked an interest in using them for metagenomic analysis.

Here we present a benchmark of several tools for metagenomic taxonomic classification, tested on in-silico datasets that were constructed using real long reads from isolate sequencing. We compared tools that were either newly developed for or modified to work with long reads, including Kraken, Centrifuge, CLARK, MetaMaps and MEGAN-LR. The test datasets were constructed with varying numbers of bacterial and eukaryotic genomes, to simulate different metagenomic applications. The tools were tested on their ability to accurately detect species and precisely estimate species abundances in the samples.

Our analysis showed that all tested classifiers provide useful results, and that accuracy was strongly influenced by the comprehensiveness of the default database used. Using the same database for all tools provided comparable results across methods except for MetaMaps which had slightly better performance, but was slower than k-mer based tools.

Charting the secondary metabolic diversity of 209,211 microbial genomes and metagenome-assembled genomes
COSI: MICROBIOME COSI
  • Satria Ardhe Kautsar, Wageningen University, Netherlands
  • Dick de Ridder, Wageningen University, Netherlands
  • Marnix Medema, Wageningen University, Netherlands
  • Justin Jj van der Hooft, Wageningen University, Netherlands

Short Abstract: Microbial secondary metabolism plays a central role in the community dynamics of the microbiome. The wide arsenal of unique chemical compounds produced by these pathways is used by the microbes to gain survival advantages and to interact with its environment. To investigate this metabolism, genome mining of Biosynthetic Gene Clusters (BGCs) acts as a bridge, linking gene sequences to the chemistry of compounds they produced. With the large, ever-increasing number of genomes and metagenomes being sequenced, a map of biosynthetic diversity across taxa will help us chart our course in natural product discovery and microbial ecology. Here, we introduce BiG-SLiCE, a highly scalable tool for the large scale clustering analyses of BGC data. Using this new tool, we performed a global homology analysis of 1,225,071 BGCs identified from 188,623 microbial isolate genomes and 20,588 previously published metagenome-assembled genomes in roughly 100 hours of wall-time on a 36-cores CPU. The analysis reveals the true extent of microbial product diversity, showing a high number of potential novelty, especially from environmental microbes. Furthermore, the collection of GCF models it produced may be used in combination with long reads sequencing technology to perform BGC-based functional metagenomics.

Comparative study of the microbiome of the native plant Ceanothus velutinus (snowbrush) from different locations and the effect of the microbiome on the growth of snowbrush in the greenhouse conditions
COSI: MICROBIOME COSI
  • Jyothsna Ganesh, Utah State University, United States
  • Youping Sun, Utah State University, United States
  • Aaron Thomas, Utah State University, United States
  • Amita Kaundal, Utah State University, United States

Short Abstract: Environmental stresses such as biotic and abiotic stresses affect plant health and reduce crop production. Rhizosphere microbiome of a plant plays a significant role in a plant's defense against various biotic and abiotic stresses. In this study, we are investigating the microbiome diversity of bulk soil, rhizosphere, and endosphere of Ceanothus velutinus, snowbrush. Ceanothus is an evergreen native plant that is usually found in dry areas and thrives well in harsh conditions. The snowbrush samples collected from different locations and elevations from the Tony Grove area of the Intermountain West region of US. The DNA was isolated from all the samples. The sequencing of 16s rRNA (V3 and V4 region) for bacteria and ITS for fungi was performed. The obtained NGS data of 16s rRNA has been analyzed by the QIIME tool to investigate microbial diversity in all the samples. The results revealed the dominance of Proteobacteria 57%, 37%, and 41% followed by Actinobacteria 27%, 33%, and 28% in endosphere, bulk, and rhizospheric soil samples respectively. A significant increase observed in the growth of snowbrush cutting plants when inoculated with native soil as compared to non-inoculated plants. Investigation for microbial diversity in treated vs not treated plants is in progress.

Data Pooling to Investigate the Abundance of Gut Microbiome Composition in Autoimmune Disease
COSI: MICROBIOME COSI
  • Bhuwan Rai, Univeristy of Toledo, United States
  • Sadik Khuder, University of Toledo, United States

Short Abstract: The gastrointestinal microbiome influences the host immune system, both directly and indirectly. The objective of this study is to identify the microbiome species that are differentially abundant in patients with autoimmune diseases. We compared the gut microbiome compositions in patients with Rheumatoid Arthritis (9), Osteoarthritis (9) and Inflammatory Bowel Disease (IBD; 1203) compared to 45 matched controls. Sequence files were downloaded from the Sequence Read Archive (SRA), using SRA Toolkit. R package dada2 (version 1.14.0) was used to construct a sequence table and assign taxonomy. Abundance and diversity were analyzed using the phyloseq R package. Bacteroides, Clostridium, Faecalibacterium, and Ruminococcus are the most common abundant species identified in all the study in IBD patients whereas Prevotella, Bacteroides, Faecalibactrium, and Ruminococcus are the most differently abundant species identified in rheumatoid arthritis and osteoarthritis compared to the control samples. Microbiome effects on gene expression were also examined in ileum tissue and showed significant differences for SEMG1, PHEX, SLC22A15 and SLC7a11 (p <0.05) for IBD patients analyzed by GEO2R in Gene Expression Omnibus site. In conclusion, our study shows a significant difference in the composition of the gut microbiome in autoimmune diseases, which may serve as a useful diagnostic biomarker.

Discovery and Investigation of Fibrillar Adhesins in Bacterial Proteomes
COSI: MICROBIOME COSI
  • Vivian Monzon, European Molecular Biology Laboratory, European Bioinformatics Institute - EMBL-EBI, Cambridge, United Kingdom
  • Alex Bateman, European Bioinformatics Institute EMBL-EBI, United Kingdom

Short Abstract: Understanding the interactions between bacteria and humans is essential for preventing diseases by pathogens, but also to explain functions of commensal bacteria. Adhesive proteins bind to host cells directly or via components of the extracellular matrix. One type of adhesive proteins, called fibrillar adhesins, is characterised by repeating domains, which fold into a stalk that projects the adhesive domain away from the cell surface.
We could detect fibrillar adhesins in bacteria across various phyla. In gram positive bacteria their domains are arranged in a stable architecture, with the adhesive domain at the N-terminus and the stalk at the C-terminus. In gram negative bacteria less fibrillar adhesins could be found and the domain architecture often differs from the described one. I am characterising fibrillar adhesins to be able to identify them computationally. I developed a prototype pipeline to find binding proteins by applying their characteristics and I tested in on the Staphylococcus aureus NCTC8325 proteome. The pipeline detected 13 out of 14 known adhesive proteins and two potential new ones. Currently, I am working on a machine learning algorithm to find further unknown fibrillar adhesins.

Engineering the microbiome under individualized perturbations
COSI: MICROBIOME COSI
  • Beatriz García-Jiménez, Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Spain
  • Joaquín Medina, Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA) Universidad Politécnica de Madrid, Spain
  • Mark D. Wilkinson, Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA) Universidad Politécnica de Madrid, Spain

Short Abstract: Microbiome dynamics studies highlight the inability to predict the effects of external perturbation on complex microbial communities over time.
MDPbiome contributes to addressing this challenge, in the context of moving microbiome studies from descriptive to translational approaches. MDPbiome is an Artificial Intelligence system built using Markov Decision Processes (MDP) to provide in-silico recommendations (e.g., about diet, pre/pro-biotics, drugs, etc.) to guide the subject’s microbiome through a path towards health or high performance, relying on microbial community changes in response to perturbations.

In addition, given the lack of experimental longitudinal microbiome datasets, we have designed a novel system to simulate the dynamics of microbial communities under perturbations using genome-scale metabolic models (GEMs), called MMODES. This makes possible to extend the application of MDPbiome to novel microbial synthetic communities and suggest interventions to modulate them to preserve or to reach a desired state. Interventions can be modifications to the nutrients in the medium or the microorganisms in the community.

MDPbiome and MDPbiomeGEM (MDPbiome plus MMODES) systems have been successfully applied to several real and simulated case studies in human, animal and soil microbiomes, to chick gut flora maturation, soil decontamination, to recover from Crohn's disease, and to avoid bacterial vaginosis.

Evaluation of the impact of sequencing depth and choice of bioinformatic methods in the taxonomic and functional profiling of the honeybee (Apis mellifera) microbiome
COSI: MICROBIOME COSI
  • Rodrigo Ortega Polo, Agriculture and Agri-Food Canada, Canada
  • Amanda Gregoris, Agriculture and Agri-Food Canada, Canada
  • Lan Tran, Agriculture and Agri-Food Canada, Canada
  • Shefali Vishwakarma, Agriculture and Agri-Food Canada, Canada
  • Marta Guarna, Agriculture and Agri-Food Canada, Canada

Short Abstract: Honeybees make a crucial contribution to Agriculture through their pollination services. The diverse members of the honey bee gut microbiome have a symbiotic and/or pathogenic relationship with bees, and they directly impact bee health and immunity. Recent studies have highlighted the role of metagenomic sequencing in gaining a better understanding of both the microbiome composition and the functional profiles of the honeybee gut microbial communities. In this study, we evaluated the impact of sequencing depth on the taxonomic and functional profiling of the honeybee gut microbiome using metagenomic sequencing. Furthermore, we compared analytic methods to characterize de novo metagenomic assemblies, and we compared the resolution of shotgun metagenomic data and 16S rRNA microbial community profiling for the characterization of bacterial members within the honey bee microbiome. In this work we compared existing methods that use the Snakemake and Nextflow workflow management systems for reproducible, portable, and re-usable methods for profiling and de novo assemblies. We also developed an R-based reproducible pipeline that integrates DADA2 and phyloseq using the Drake workflow management system. The work presented here will help inform decisions on methodological approaches for large-scale bee microbiome studies tailored to specific research questions in bee health and nutrition.

Genomic inference of metabolic capabilities for human gut bacteria
COSI: MICROBIOME COSI
  • Semen Leyn, Sanford Burnham Prebys Medical Discovery Institute, United States
  • Marat Kazanov, The Institute for Information Transmission Problems RAS, Russia
  • Dmitry Rodionov, Sanford-Burnham-Prebys Medical Discovery Institute, United States

Short Abstract: Genomics-based metabolic reconstruction allows us to assess in silico the metabolic potential of reference microbial species comprising the human gut microbiome (HGM) including their nutrient requirements, utilization, and production capabilities. Capturing these phenotypes by a simple binary (1/0) phenotype matirx (BPM) facilitates further comparative analysis of cumulative metabolic potentials of HGM communities. The number of isolated and sequenced HGM microorganisms is rapidly growing. We developed a computational pipeline for automated propagation of curated metabolic phenotypes from the reference BPM over new genomes from represented phylogenetic groups. Pathway gene orthologs were identified by DIAMOND searches against our reference database. First, we built the phenotype assignment algorithm using the genomic distribution of orthologs and formal phenotype rules for pathways of amino acid and vitamin biosynthesis. Second, we trained several machine learning models on reference sets of genes and phenotypes comprising >70 metabolic pathways in >2,600 reference genomes of bacteria representing the human gut microbiome (HGM).
The Support Vector Machine and Random Forest models performed the best by achieving >99% accuracy. We tested both approaches on two recently appeared public sets of >2,000 HGM genomes. The expanded BPM reveals substantial strain/species-level phenotype variations and allowed to establish taxonomic boundaries of phenotype conservation.

Identifying short open reading frames (smORFs) with deep learning
COSI: MICROBIOME COSI
  • Shaojun Pan, Fudan University, China
  • Luis Pedro Coelho, Fudan University, China

Short Abstract: Standard computational gene prediction methods do not predict very short genes. This is due to methodological limitations and to the historical belief that these sequences rarely have a biological function. Recently, however, several groups have demonstrated that there is a wealth of function in these short proteins. The computational difficulties remain as standard approaches are too prone to false positives when trying to predict smORFs within genomic sequences. Here, we take advantage of a previously published dataset of smORFs, which had used conservation signatures to eliminate likely false positives. We now frame the question as a classification problem and apply multiple input Convolutional Neural Network to this problem. Our classifier achieves 68.9% recall on 48.1% precision accuracy (the testing sets do not contain any sequences that are > 80% identical to the sequences with at least 90% coverage in training sets).This demonstrates the potential of this approach for identifying smORFs in silico using only their sequences.

Lorikeet: Strain level genotyping of microbial communities from metagenomes
COSI: MICROBIOME COSI
  • Mikael Bodén, The University of Queensland, Australia
  • Gene Tyson, Queensland University of Technology, Australia
  • Benjamin Woodcroft, Queensland University of Technology, Australia
  • Rhys Newell, Queensland University of Technology, Australia

Short Abstract: Microbial species-level genomes derived from metagenomic samples often represent a consensus genotype of multiple coexisting community members. This consensus genotype will be biased towards the most dominant member within the community making it difficult to assess the genotypic diversity present within the community. The other genotypes within this community may exhibit a range of different phenotypes affecting their nutrient preference, metabolic capabilities, and pathogenicity. Single nucleotide polymorphisms and other genomic variations can be observed by comparing how reads map to a consensus genome providing insight into the genetic diversity within the community. However, combining genomic variations into accurate representations of the genotypes of non-dominant community members has remained a challenge. Here we present Lorikeet, a program for both calling and clustering observed genomic variations into strain-level genotypes. Lorikeet uses proportionality values to observe relationships between variants which are passed to a multi-stage clustering algorithm. This allows for a complete view of the genotypic diversity present within a community. Lorikeet accurately predicts genomic variations and recovers strain-level genotypes from simulated microbial communities. Here, Lorikeet was capable of observing the genotypic diversity within climate-relevant microbial communities derived from thawing permafrost from Stordalen Mire, Sweden.

Machine-learning based prospection of antimicrobial peptides (AMPs) from metagenomes using Macrel
COSI: MICROBIOME COSI
  • Célio Dias Santos-Júnior, Fudan University, China
  • Xing-Ming Zhao, Tongji University, China
  • Shaojun Pan, Fudan University, China
  • Luis Pedro Coelho, Fudan University, China

Short Abstract: Antimicrobial peptides (AMPs) are peptides (≤ 100 residues) with antimicrobial properties, which are used in both clinical and non-clinical environments. Metagenomes and metatranscriptomes present an opportunity for prospect novel AMPs. However, standard methods do not apply to shorter peptides as we show empirically. Here, we present MACREL (for Meta(genomic) AMPs Classification and REtrievaL), an end-to-end pipeline that works from metagenomes/metatranscriptomes (in the form of short reads) or genomes (in the form of pre-assembled contigs) and predicts the AMP therein. Macrel uses random forest classifiers trained with a novel set of 22 descriptors that represent the main AMP features. The effectiveness of Macrel in AMP prediction was benchmarked using realistic simulations and real metagenomic data. Macrel is available as open-source software at github.com/BigDataBiology/macrel and as a web server: big-data-biology.org/software/macrel. We show that Macrel has comparable overall performance (Acc. 94.6%, MCC 0.90) to other state-of-art methods, achieving the highest specificity (99.8%) compared to other methods. AMPs are likely to be relatively rare, thus reducing the number of false positives is more important than reducing false negatives. High-quality AMP candidates were recovered, and most were expressed in metatranscriptomes from the same biological samples.

Meta-NanoSim: metagenome simulator for nanopore reads
COSI: MICROBIOME COSI
  • Theodora Lo, Canada's Michael Smith Genome Sciences Centre, Canada
  • Chen Yang, Canada's Michael Smith Genome Sciences Centre, Canada
  • Ka Ming Nip, Canada's Michael Smith Genome Sciences Centre, Canada
  • Saber Hafezqorani, Canada's Michael Smith Genome Sciences Centre, Canada
  • Rene L. Warren, BC Cancer Genome Sciences Centre., Canada
  • Inanc Birol, Canada's Michael Smith Genome Sciences Centre, Canada

Short Abstract: As a long-read sequencing technique, Oxford Nanopore Technology (ONT) has shown unprecedented potential in metagenomic studies. However, the challenges associated with ONT reads, such as high error rate and non-uniform error distributions, necessitate analytical tools designed specifically for long reads. To facilitate the development and benchmarking, simulated datasets with known ground truth are desirable. Here, we present Meta-NanoSim, a fast and lightweight ONT read simulator that characterizes and simulates the unique properties of ONT metagenomes, including abundance levels, chimeric reads, and reads that span both ends of a circular genome. Provided with the empirical profiles and abundance profile learnt from experimental dataset, multi-sample multi-replicate metagenome datasets are generated to simulate microbial communities with both circular and linear genomes. To demonstrate its performance, we train Meta-NanoSim with two mock microbial community standards and compare the simulation results against state-of-the-art tools. Further, we showcase the application of Meta-NanoSim through benchmarking ONT metagenome assemblers on our simulated datasets. Gold standards provided by Meta-NanoSim will facilitate the development of algorithms and pipelines in metagenomics, including functional gene prediction, species detection, comparative metagenomics, and clinical diagnosis. As such, we expect Meta-NanoSim to have an enabling role in the field.

MiMeNet: Exploring the Microbiome-Metabolome Relationships using Neural Networks
COSI: MICROBIOME COSI
  • Derek Reiman, University of Illinois at Chicago, United States
  • Yang Dai, University of Illinois at Chicago, United States

Short Abstract: The microbial community has been shown to be involved in host development as well as the pathogenesis of various diseases. The microbial community is believed to functionally interact with their host at a metabolic level through symbiotic interactions and co-metabolism. Recent studies are beginning to highlight various metabolic dysregulations leading to the development of metabolic diseases. However, there is a lack of metabolomic data as it is costly and difficult to obtain. Therefore. the ability to predict unknown metabolomic profiles using microbial features would be extremely useful. Here, we describe MiMeNet, a neural network model to predict the metabolomics profile from microbial features. Using three paired microbiome-metabolomic datasets, we show that MiMeNet has superior predictive performance compared to the state-of-art linear models. In particular, MiMeNet uses data from one cohort of patients with inflammatory bowel disease to accurately predict the metabolomic profile of a second external cohort. Additionally, MiMeNet can be used to interpret the underlying structure of the microbe-metabolite interaction network, providing insights for the causes of metabolic dysregulation in disease which could allow for future hypothesis generation.

Multi-omics analysis in the field reveals a close association between bacterial communities and mineral properties in the soybean rhizosphere
COSI: MICROBIOME COSI
  • Shinichi Yamazaki, Tohoku Medical Megabank Organization ,Tohoku University, Japan
  • Hossein Mardani-Korrani, Tokyo University of Agriculture and Technology, Jersey
  • Rumi Kaida, Tokyo University of Agriculture and Technology, Japan
  • Yoshiharu Fujii, Tokyo University of Agriculture and Technology, Japan
  • Kumiko Ochiai, Kyoto University, Japan
  • Masaru Kobayashi, Kyoto University, Japan
  • Akifumi Sugiyama, Kyoto University, Japan
  • Yuichi Aoki, Tohoku Medical Megabank Organization ,Tohoku University, Japan

Short Abstract: The rhizosphere, an interface region between the plant root and soil, can directly affect plant growth and development. Over the past century, it has been assumed that the rhizosphere establishes a characteristic environment including microbiota, metabolites, and minerals that is different from the outer soil region (bulk soil). However, holistic insights into the rhizosphere and molecular mechanisms of the formation of this characteristic environment are not well understood. In the present study, we investigated the spatiotemporal dynamics of the root-associated environment in actual field conditions by multi-omics analyses (mineral, microbiome, and transcriptome) of the soybean rhizosphere. Mineral and microbiome analyses demonstrated a characteristic rhizosphere environment in which most of the essential nutrients for plants were highly accumulated and bacterial communities were distinct from those in the bulk soil. Mantel’s test and co-abundance network analysis revealed that characteristic community structures and dominant bacterial taxa in the rhizosphere significantly interact with mineral contents in the rhizosphere, but not in the bulk soil. Regression analysis also revealed that bacterial composition was associated with mineral contents in the rhizosphere. Our field multi-omics analysis suggests a rhizosphere-specific close association between the microbiota and mineral environment.

Phenotypic characterization of complex microbial communities
COSI: MICROBIOME COSI
  • Dmitry Rodionov, Sanford-Burnham-Prebys Medical Discovery Institute, United States
  • Stanislav Iablokov, Institute for Information Transmission Problems, Russia

Short Abstract: Metabolic capabilities (phenotypes) of each microbial species are defined by the presence or absence of pathways encoded in their respective genomes. We reconstructed >70 metabolic pathways in >2,600 reference genomes of bacteria representing the human gut microbiome (HGM) and assigned metabolic phenotypes for (i) utilization of primary sources of energy/carbon (sugars, amino acids); (ii) synthesis of essential nutrients (vitamins/cofactors, amino acids); (iii) excretion of fermentation end-products (short-chain fatty acids). Capturing these phenotypes by a simple binary (1/0) phenotype matrix (BPM) facilitates comparative analysis of the cumulative metabolic potential of microbial communities. To enable metabolic phenotype profiling of microbiomes, we established a computational pipeline converting 16S metagenomic profiles into Community Phenotype Profiles comprised of Community Phenotype Index (CPI) representing fractional representation of all “1”-phenotypes (vitamin prototrophs, sugar utilizers, etc). We applied this approach to assess the distribution of metabolic capabilities in several large HGM datasets from healthy and sick subjects. We also introduce a concept of phenotypic diversity as a diversity of the subcommunity of organisms possessing a particular metabolic phenotype. The obtained functional diversity metrics (Alpha and Beta diversity of phenotypes) reflect phenotype distribution in microbiome samples and allow to train machine learning models for sample classification.

PIRATE- Phage Identification fRom Assembly-graph varianT Elements
COSI: MICROBIOME COSI
  • Harihara Subrahmaniam Muralidharan, University of Maryland, College Park, United States
  • Jacquelyn S Meisel, University of Maryland, College Park, United States
  • Nidhi Shah, University of Maryland, College Park, United States
  • Mihai Pop, University of Maryland, College Park, United States

Short Abstract: Bacteriophages are viruses that infect and destroy bacteria. As bacteria rapidly evolve to counter the effect of antibiotic drugs, bacteriophages are being explored as complements and alternatives to antibiotics. Identification and characterization of novel phage from sequencing data is critical to achieve this goal, but presents many computational challenges. We developed MetaCarvel (github.com/marbl/MetaCarvel), a scaffolding tool that detects assembly graph motifs representative of biologically-relevant variants. Some bubble and repeat motifs detected by MetaCarvel represent phage integration events, providing the opportunity for detecting novel phage within microbial communities. Bubbles, indicating genomic insertion/deletion events or strain variants, may contain specialist phage, while repeat elements may capture generalist phage, common to multiple closely related bacterial hosts. Our assembly graph based methods were able to detect crAssphage (the first computationally identified phage) within variants in 208 human gut microbiome samples. To identify novel phage in metagenomes, we extracted repeat and bubble contigs(unitigs) that did not share sufficient similarity with known organisms. We clustered contigs with similar genomic content and blasted predicted genes from each cluster against the UniProt phage database. Multiple clusters contained sequences rich in integrase genes, tail proteins and tape measure proteins, suggesting these sequences represent genomic fragments from previously uncharacterized phage.

PLoT-ME: Pre-classification of Long-reads for Memory Efficient Taxonomic assignment
COSI: MICROBIOME COSI
  • Sylvain Riondet, National University of Singapore / Genome Institute of Singapore, Singapore
  • Niranjan Nagarajan, Genome Institute of Singapore, Singapore

Short Abstract: With increasing feasibility, long-read metagenomics can enable high-resolution taxonomic analysis in a range of applications from diagnostics to forensics. The ease of access via portable long-read platforms (e.g. MinION) is in contrast to the need for significant memory resources when classifiers try to provide precise reads assignments (to strain or sub-strain level) or identify a wider set of organisms (e.g. large eukaryotes). To address this, memory-efficient taxonomic classifiers are an active area of research, with methods based on compact indexes providing various tradeoffs between memory usage and speed.

Here we present a general-purpose strategy (PLoT-ME) that leverages the information in k-mer frequency (3-5bp) spectrums of long-reads to pre-classify them, allowing existing classifiers to further assign them against subsets of the reference database.

Evaluation on mock communities (real reads) shows that PLoT-ME’s fast K-means classifier provides a scalable, compact approach to rapidly pre-classify long error-prone reads (PacBio, Oxford Nanopore) without loss in classification performance. PLoT-ME was found to be robust to a range of read lengths (500bp-10kbp) and provides up to an order-of-magnitude reduction in memory requirements. We envisage that with further improvements in long-read metagenomic classifiers, this approach will enable a general-purpose strategy for high-resolution, low-memory microbiome analysis.

Reconstructing unculturable microbial genomes from clinically relevant stool samples using mixed sequencing and bioinformatic approaches
COSI: MICROBIOME COSI
  • Ben Callaghan, Dalhousie University, Canada
  • André Comeau, Dalhousie University, Canada
  • Morgan Langille, Dalhousie University, Canada

Short Abstract: The potential for long-read sequencing in producing de novo bacterial genomes/metagenomes is well-documented. While more costly, long-read sequencing can resolve complex genomes with more completeness and accuracy. Hybrid assembly methods promise more cost-effective genomes via the integration of cheaper short-read sequences. The quantities of short and long reads required for hybrid approaches are important considerations for any genomic/metagenomic effort. Both the minimum number of reads and the number at which diminishing returns to completeness/accuracy occur inform targets for sequencing depth, with implications for project cost and feasibility. In addition, any genome or metagenome assembly must begin with the choice of assembler, which can be difficult given the variety of methods. To these ends, we sequenced two bacterial genome samples (using Illumina and PacBio sequencing) to benchmark and validate several assembly methods: SPAdes (short read/hybrid assembly), Canu (long read assembly), and Canu/HG-CoLoR (polished assembly). We perform similar analyses for two clinical metagenomic samples. Hybrid assembly was less expensive computationally and produced genomes with better assembly quality than either Illumina or PacBio alone and converges with sparser input data. We discuss practical constraints of sequencing depths needed for real-world applications of hybrid approaches in genomic/metagenomic assembly.

Studying the dynamics of the gut microbiota using metabolically stable isotopic labeling and metaproteomics
COSI: MICROBIOME COSI
  • Patrick Smyth, University of Ottawa, Canada
  • Xu Zhang, University of Ottawa, Canada
  • Zhibin Ning, University of Ottawa, Canada
  • Janice Mayne, University of Ottawa, Canada
  • Jasmine Moore, University of Ottawa, Canada
  • Krystal Walker, University of Ottawa, Canada
  • Daniel Figeys, University of Ottawa, Canada
  • Mathieu Lavallée-Adam, University of Ottawa, Canada

Short Abstract: The gut microbiome and its metabolic processes are dynamic systems. Surprisingly, our understanding of gut microbiome dynamics is limited. Here we report a metaproteomic workflow that involves protein stable isotope probing (protein-SIP) and identification/quantification of partially labeled peptides. We also developed a package, which we call MetaProfiler, that corrects for false identifications and performs phylogenetic and time series analysis for the study of microbiome dynamics. From the stool sample of five mice that were fed with 15N hydrolysate from Ralstonia eutropha, we identified 15,297 non-redundant unlabeled peptides of which 10,839 of their heavy counterparts were quantified. These results revealed that i) isotope incorporation in proteins differed between taxa, ii) the rate of protein synthesis was lower in the microbiota than in mice, and iii) differences in protein synthesis appeared across protein functions. Interestingly, the phylum Verrucomicrobia and the genera, Akkermansia, Lactobacillus, and Ruminococcus had not reached a plateau of isotopic incorporation 43 days after the continuous introduction of the isotope. Altogether, our study provides an efficient workflow for the study of dynamics of gut microbiota, and our findings helped better understand the complex host-microbiome interactions.

Systems Modelling of the Skin Microbiome
COSI: MICROBIOME COSI
  • Rachita Kumar, SASTRA Deemed University, India
  • Karthik Raman, Indian Institute of Technology Madras, India

Short Abstract: The skin microbiome is a complex ecosystem comprising diverse microorganisms inhabit distinct microenvironments. The nature of interactions within the microbial community and with the host, have far-reaching implications for skin health and disease. In this study, we present several insights into these interactions, through a systems-level graph-theoretic analysis of the skin microbiome, by studying the metabolic networks of constituent organisms. We illustrate several dependencies amongst the skin species by considering the possible metabolic exchanges that can take place, and also look into the preferred mechanisms, if any, adopted by species to support each other. Using MetQuest, a graph-theoretic algorithm previously developed in our laboratory, we quantify the extent and nature of support one organisms offers to another within a community, which forms the basis of our community. Interestingly, we observe a clear dependence of Staphylococci and other species on Corynebacteria. We further show that the metabolic exchanges between organisms are predominated by enzyme classes such as transferases and translocases, across many genera. Our approach presents several testable hypotheses for experimental verification and is a generic approach to unravel the complexity of diverse microbiomes, as seen through the lens of metabolism.

Using Conditional Generative Adversarial Networks to Boost the Performance of Machine Learning in Microbiome Datasets
COSI: MICROBIOME COSI
  • Derek Reiman, University of Illinois at Chicago, United States
  • Yang Dai, University of Illinois at Chicago, United States

Short Abstract: The microbiome of the human body has been shown to have profound effects on physiological regulation and disease pathogenesis. However, association analysis based on statistical modeling of microbiome data has continued to be a challenge due to inherent noise, the complexity of data, and high cost of collecting a large number of samples. To address this challenge, we employed a deep learning framework to construct a data-driven simulation of microbiome data using a conditional generative adversarial network. Conditional generative adversarial networks train two models against each other while leveraging side information learn from a given dataset to compute larger simulated datasets that are representative of the original dataset. In our study, we used a cohort of patients with inflammatory bowel disease to show that not only can the generative adversarial network generate samples representative of the original data based on multiple diversity metrics, but also that training machine learning models on the synthetic samples can improve disease prediction through data augmentation. Additionally, we show that the synthetic samples generated by this cohort can boost disease prediction of a different external cohort.

Using scaffolds to improve the contiguity and quality of metagenomic bins
COSI: MICROBIOME COSI
  • Harihara Subrahmaniam Muralidharan, University of Maryland, College Park, United States
  • Jacquelyn S Meisel, University of Maryland, College Park, United States
  • Nidhi Shah, University of Maryland, College Park, United States
  • Mihai Pop, University of Maryland, College Park, United States

Short Abstract: Metagenomics has revolutionized the field of microbiology, however, reconstructing complete genomes of organisms from metagenomic data is still challenging. Recovered genomes are often fragments, due to repeats within and across genomes, uneven abundance of organisms, sequencing errors, and strain-level variations within a single sample. To address the fragmented nature of metagenomic assemblies, scientists rely on a process called binning which clusters together contigs that are inferred to originate from the same organism. Existing binning algorithms use oligonucleotide frequencies and contig abundance (coverage) within and across samples to group together contigs from the same organism. However, these algorithms often miss short contigs and contigs from regions with unusual coverage or DNA composition characteristics, such as mobile elements. Here we propose that information from assembly graphs can assist current strategies for metagenomic binning. We use MetaCarvel, a metagenomic scaffolding tool, to construct assembly graphs where contigs are nodes and edges are inferred based on mate pair or paired-end reads. We show that binning scaffolds, rather than contigs, improves the contiguity and quality of the resulting bins on a mock community and within five stool samples from the Human Microbiome Project.